40 research outputs found
Exploiting Image-trained CNN Architectures for Unconstrained Video Classification
We conduct an in-depth exploration of different strategies for doing event
detection in videos using convolutional neural networks (CNNs) trained for
image classification. We study different ways of performing spatial and
temporal pooling, feature normalization, choice of CNN layers as well as choice
of classifiers. Making judicious choices along these dimensions led to a very
significant increase in performance over more naive approaches that have been
used till now. We evaluate our approach on the challenging TRECVID MED'14
dataset with two popular CNN architectures pretrained on ImageNet. On this
MED'14 dataset, our methods, based entirely on image-trained CNN features, can
outperform several state-of-the-art non-CNN models. Our proposed late fusion of
CNN- and motion-based features can further increase the mean average precision
(mAP) on MED'14 from 34.95% to 38.74%. The fusion approach achieves the
state-of-the-art classification performance on the challenging UCF-101 dataset
The SURE-LET approach to image denoising
Denoising is an essential step prior to any higher-level image-processing tasks such as segmentation or object tracking, because the undesirable corruption by noise is inherent to any physical acquisition device. When the measurements are performed by photosensors, one usually distinguish between two main regimes: in the first scenario, the measured intensities are sufficiently high and the noise is assumed to be signal-independent. In the second scenario, only few photons are detected, which leads to a strong signal-dependent degradation. When the noise is considered as signal-independent, it is often modeled as an additive independent (typically Gaussian) random variable, whereas, otherwise, the measurements are commonly assumed to follow independent Poisson laws, whose underlying intensities are the unknown noise-free measures. We first consider the reduction of additive white Gaussian noise (AWGN). Contrary to most existing denoising algorithms, our approach does not require an explicit prior statistical modeling of the unknown data. Our driving principle is the minimization of a purely data-adaptive unbiased estimate of the mean-squared error (MSE) between the processed and the noise-free data. In the AWGN case, such a MSE estimate was first proposed by Stein, and is known as "Stein's unbiased risk estimate" (SURE). We further develop the original SURE theory and propose a general methodology for fast and efficient multidimensional image denoising, which we call the SURE-LET approach. While SURE allows the quantitative monitoring of the denoising quality, the flexibility and the low computational complexity of our approach are ensured by a linear parameterization of the denoising process, expressed as a linear expansion of thresholds (LET).We propose several pointwise, multivariate, and multichannel thresholding functions applied to arbitrary (in particular, redundant) linear transformations of the input data, with a special focus on multiscale signal representations. We then transpose the SURE-LET approach to the estimation of Poisson intensities degraded by AWGN. The signal-dependent specificity of the Poisson statistics leads to the derivation of a new unbiased MSE estimate that we call "Poisson's unbiased risk estimate" (PURE) and requires more adaptive transform-domain thresholding rules. In a general PURE-LET framework, we first devise a fast interscale thresholding method restricted to the use of the (unnormalized) Haar wavelet transform. We then lift this restriction and show how the PURE-LET strategy can be used to design and optimize a wide class of nonlinear processing applied in an arbitrary (in particular, redundant) transform domain. We finally apply some of the proposed denoising algorithms to real multidimensional fluorescence microscopy images. Such in vivo imaging modality often operates under low-illumination conditions and short exposure time; consequently, the random fluctuations of the measured fluorophore radiations are well described by a Poisson process degraded (or not) by AWGN. We validate experimentally this statistical measurement model, and we assess the performance of the PURE-LET algorithms in comparison with some state-of-the-art denoising methods. Our solution turns out to be very competitive both qualitatively and computationally, allowing for a fast and efficient denoising of the huge volumes of data that are nowadays routinely produced in biomedical imaging
A CURE for noisy magnetic resonance images: Chi-square unbiased risk estimation
In this article we derive an unbiased expression for the expected
mean-squared error associated with continuously differentiable estimators of
the noncentrality parameter of a chi-square random variable. We then consider
the task of denoising squared-magnitude magnetic resonance image data, which
are well modeled as independent noncentral chi-square random variables on two
degrees of freedom. We consider two broad classes of linearly parameterized
shrinkage estimators that can be optimized using our risk estimate, one in the
general context of undecimated filterbank transforms, and another in the
specific case of the unnormalized Haar wavelet transform. The resultant
algorithms are computationally tractable and improve upon state-of-the-art
methods for both simulated and actual magnetic resonance image data.Comment: 30 double-spaced pages, 11 figures; submitted for publicatio
From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Current state-of-the-art models for natural language understanding require a
preprocessing step to convert raw text into discrete tokens. This process known
as tokenization relies on a pre-built vocabulary of words or sub-word
morphemes. This fixed vocabulary limits the model's robustness to spelling
errors and its capacity to adapt to new domains. In this work, we introduce a
novel open-vocabulary language model that adopts a hierarchical two-level
approach: one at the word level and another at the sequence level. Concretely,
we design an intra-word module that uses a shallow Transformer architecture to
learn word representations from their characters, and a deep inter-word
Transformer module that contextualizes each word representation by attending to
the entire word sequence. Our model thus directly operates on character
sequences with explicit awareness of word boundaries, but without biased
sub-word or word-level vocabulary. Experiments on various downstream tasks show
that our method outperforms strong baselines. We also demonstrate that our
hierarchical model is robust to textual corruption and domain shift.Comment: Accepted to ACL 2023 Main Conferenc
User Loss -- A Forced-Choice-Inspired Approach to Train Neural Networks directly by User Interaction
In this paper, we investigate whether is it possible to train a neural
network directly from user inputs. We consider this approach to be highly
relevant for applications in which the point of optimality is not well-defined
and user-dependent. Our application is medical image denoising which is
essential in fluoroscopy imaging. In this field every user, i.e. physician, has
a different flavor and image quality needs to be tailored towards each
individual.
To address this important problem, we propose to construct a loss function
derived from a forced-choice experiment. In order to make the learning problem
feasible, we operate in the domain of precision learning, i.e., we inspire the
network architecture by traditional signal processing methods in order to
reduce the number of trainable parameters. The algorithm that was used for this
is a Laplacian pyramid with only six trainable parameters.
In the experimental results, we demonstrate that two image experts who prefer
different filter characteristics between sharpness and de-noising can be
created using our approach. Also models trained for a specific user perform
best on this users test data. This approach opens the way towards
implementation of direct user feedback in deep learning and is applicable for a
wide range of application.Comment: Accepted on BVM 2019; Extended ArXiv Version with additional figures
and detail
LMDX: Language Model-based Document Information Extraction and Localization
Large Language Models (LLM) have revolutionized Natural Language Processing
(NLP), improving state-of-the-art on many existing tasks and exhibiting
emergent capabilities. However, LLMs have not yet been successfully applied on
semi-structured document information extraction, which is at the core of many
document processing workflows and consists of extracting key entities from a
visually rich document (VRD) given a predefined target schema. The main
obstacles to LLM adoption in that task have been the absence of layout encoding
within LLMs, critical for a high quality extraction, and the lack of a
grounding mechanism ensuring the answer is not hallucinated. In this paper, we
introduce Language Model-based Document Information Extraction and Localization
(LMDX), a methodology to adapt arbitrary LLMs for document information
extraction. LMDX can do extraction of singular, repeated, and hierarchical
entities, both with and without training data, while providing grounding
guarantees and localizing the entities within the document. In particular, we
apply LMDX to the PaLM 2-S LLM and evaluate it on VRDU and CORD benchmarks,
setting a new state-of-the-art and showing how LMDX enables the creation of
high quality, data-efficient parsers
A study of CP violation in B-+/- -> DK +/- and B-+/- -> D pi(+/-) decays with D -> (KSK +/-)-K-0 pi(-/+) final states
A first study of CP violation in the decay modes and , where labels a or meson and labels a or meson, is performed. The analysis uses the LHCb data set collected in collisions, corresponding to an integrated luminosity of 3 fb. The analysis is sensitive to the CP-violating CKM phase through seven observables: one charge asymmetry in each of the four modes and three ratios of the charge-integrated yields. The results are consistent with measurements of using other decay modes
Study of forward Z + jet production in pp collisions at âs=7 TeV
A measurement of the +jet production cross-section in collisions at a centre-of-mass energy TeV is presented. The analysis is based on an integrated luminosity of recorded by the LHCb experiment. Results are shown with two jet transverse momentum thresholds, 10 and 20 GeV, for both the overall cross-section within the fiducial volume, and for six differential cross-section measurements. The fiducial volume requires that both the jet and the muons from the Z boson decay are produced in the forward direction (). The results show good agreement with theoretical predictions at the second-order expansion in the coupling of the strong interaction.A measurement of the +jet production cross-section in collisions at a centre-of-mass energy TeV is presented. The analysis is based on an integrated luminosity of recorded by the LHCb experiment. Results are shown with two jet transverse momentum thresholds, 10 and 20 GeV, for both the overall cross-section within the fiducial volume, and for six differential cross-section measurements. The fiducial volume requires that both the jet and the muons from the Z boson decay are produced in the forward direction (). The results show good agreement with theoretical predictions at the second-order expansion in the coupling of the strong interaction